{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pandas字符串处理\n",
    "\n",
    "前面我们已经使用了字符串的处理函数：  \n",
    "df[\"bWendu\"].str.replace(\"℃\", \"\").astype('int32')\n",
    "\n",
    "***Pandas的字符串处理：***  \n",
    "1. 使用方法：先获取Series的str属性，然后在属性上调用函数；\n",
    "2. 只能在字符串列上使用，不能数字列上使用；\n",
    "3. Dataframe上没有str属性和处理方法\n",
    "4. Series.str并不是Python原生字符串，而是自己的一套方法，不过大部分和原生str很相似；\n",
    "\n",
    "***Series.str字符串方法列表参考文档:***  \n",
    "https://pandas.pydata.org/pandas-docs/stable/reference/series.html#string-handling\n",
    "  \n",
    "  \n",
    "***本节演示内容：***  \n",
    "1. 获取Series的str属性，然后使用各种字符串处理函数\n",
    "2. 使用str的startswith、contains等bool类Series可以做条件查询\n",
    "3. 需要多次str处理的链式操作\n",
    "4. 使用正则表达式的处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 0、读取北京2018年天气数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "fpath = \"./datas/beijing_tianqi/beijing_tianqi_2018.csv\"\n",
    "df = pd.read_csv(fpath)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ymd</th>\n",
       "      <th>bWendu</th>\n",
       "      <th>yWendu</th>\n",
       "      <th>tianqi</th>\n",
       "      <th>fengxiang</th>\n",
       "      <th>fengli</th>\n",
       "      <th>aqi</th>\n",
       "      <th>aqiInfo</th>\n",
       "      <th>aqiLevel</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>2018-01-01</td>\n",
       "      <td>3℃</td>\n",
       "      <td>-6℃</td>\n",
       "      <td>晴~多云</td>\n",
       "      <td>东北风</td>\n",
       "      <td>1-2级</td>\n",
       "      <td>59</td>\n",
       "      <td>良</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>2018-01-02</td>\n",
       "      <td>2℃</td>\n",
       "      <td>-5℃</td>\n",
       "      <td>阴~多云</td>\n",
       "      <td>东北风</td>\n",
       "      <td>1-2级</td>\n",
       "      <td>49</td>\n",
       "      <td>优</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>2018-01-03</td>\n",
       "      <td>2℃</td>\n",
       "      <td>-5℃</td>\n",
       "      <td>多云</td>\n",
       "      <td>北风</td>\n",
       "      <td>1-2级</td>\n",
       "      <td>28</td>\n",
       "      <td>优</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>2018-01-04</td>\n",
       "      <td>0℃</td>\n",
       "      <td>-8℃</td>\n",
       "      <td>阴</td>\n",
       "      <td>东北风</td>\n",
       "      <td>1-2级</td>\n",
       "      <td>28</td>\n",
       "      <td>优</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>2018-01-05</td>\n",
       "      <td>3℃</td>\n",
       "      <td>-6℃</td>\n",
       "      <td>多云~晴</td>\n",
       "      <td>西北风</td>\n",
       "      <td>1-2级</td>\n",
       "      <td>50</td>\n",
       "      <td>优</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          ymd bWendu yWendu tianqi fengxiang fengli  aqi aqiInfo  aqiLevel\n",
       "0  2018-01-01     3℃    -6℃   晴~多云       东北风   1-2级   59       良         2\n",
       "1  2018-01-02     2℃    -5℃   阴~多云       东北风   1-2级   49       优         1\n",
       "2  2018-01-03     2℃    -5℃     多云        北风   1-2级   28       优         1\n",
       "3  2018-01-04     0℃    -8℃      阴       东北风   1-2级   28       优         1\n",
       "4  2018-01-05     3℃    -6℃   多云~晴       西北风   1-2级   50       优         1"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ymd          object\n",
       "bWendu       object\n",
       "yWendu       object\n",
       "tianqi       object\n",
       "fengxiang    object\n",
       "fengli       object\n",
       "aqi           int64\n",
       "aqiInfo      object\n",
       "aqiLevel      int64\n",
       "dtype: object"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1、获取Series的str属性，使用各种字符串处理函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.strings.StringMethods at 0x1af21871808>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"bWendu\"].str"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       3\n",
       "1       2\n",
       "2       2\n",
       "3       0\n",
       "4       3\n",
       "       ..\n",
       "360    -5\n",
       "361    -3\n",
       "362    -3\n",
       "363    -2\n",
       "364    -2\n",
       "Name: bWendu, Length: 365, dtype: object"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 字符串替换函数\n",
    "df[\"bWendu\"].str.replace(\"℃\", \"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      False\n",
       "1      False\n",
       "2      False\n",
       "3      False\n",
       "4      False\n",
       "       ...  \n",
       "360    False\n",
       "361    False\n",
       "362    False\n",
       "363    False\n",
       "364    False\n",
       "Name: bWendu, Length: 365, dtype: bool"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 判断是不是数字\n",
    "df[\"bWendu\"].str.isnumeric()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "ename": "AttributeError",
     "evalue": "Can only use .str accessor with string values!",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-8-12cdcbdb6f81>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mdf\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m\"aqi\"\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mstr\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;32md:\\appdata\\python37\\lib\\site-packages\\pandas\\core\\generic.py\u001b[0m in \u001b[0;36m__getattr__\u001b[1;34m(self, name)\u001b[0m\n\u001b[0;32m   5173\u001b[0m             \u001b[1;32mor\u001b[0m \u001b[0mname\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_accessors\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   5174\u001b[0m         ):\n\u001b[1;32m-> 5175\u001b[1;33m             \u001b[1;32mreturn\u001b[0m \u001b[0mobject\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m__getattribute__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   5176\u001b[0m         \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   5177\u001b[0m             \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_info_axis\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_can_hold_identifiers_and_holds_name\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32md:\\appdata\\python37\\lib\\site-packages\\pandas\\core\\accessor.py\u001b[0m in \u001b[0;36m__get__\u001b[1;34m(self, obj, cls)\u001b[0m\n\u001b[0;32m    173\u001b[0m             \u001b[1;31m# we're accessing the attribute of the class, i.e., Dataset.geo\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    174\u001b[0m             \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_accessor\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 175\u001b[1;33m         \u001b[0maccessor_obj\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_accessor\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    176\u001b[0m         \u001b[1;31m# Replace the property with the accessor object. Inspired by:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    177\u001b[0m         \u001b[1;31m# http://www.pydanny.com/cached-property.html\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32md:\\appdata\\python37\\lib\\site-packages\\pandas\\core\\strings.py\u001b[0m in \u001b[0;36m__init__\u001b[1;34m(self, data)\u001b[0m\n\u001b[0;32m   1915\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1916\u001b[0m     \u001b[1;32mdef\u001b[0m \u001b[0m__init__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1917\u001b[1;33m         \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_inferred_dtype\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_validate\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1918\u001b[0m         \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_is_categorical\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mis_categorical_dtype\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1919\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32md:\\appdata\\python37\\lib\\site-packages\\pandas\\core\\strings.py\u001b[0m in \u001b[0;36m_validate\u001b[1;34m(data)\u001b[0m\n\u001b[0;32m   1965\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1966\u001b[0m         \u001b[1;32mif\u001b[0m \u001b[0minferred_dtype\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mallowed_types\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1967\u001b[1;33m             \u001b[1;32mraise\u001b[0m \u001b[0mAttributeError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"Can only use .str accessor with string \"\u001b[0m \u001b[1;34m\"values!\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   1968\u001b[0m         \u001b[1;32mreturn\u001b[0m \u001b[0minferred_dtype\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   1969\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mAttributeError\u001b[0m: Can only use .str accessor with string values!"
     ]
    }
   ],
   "source": [
    "df[\"aqi\"].str.len()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2、使用str的startswith、contains等得到bool的Series可以做条件查询"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "condition = df[\"ymd\"].str.startswith(\"2018-03\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      False\n",
       "1      False\n",
       "2      False\n",
       "3      False\n",
       "4      False\n",
       "       ...  \n",
       "360    False\n",
       "361    False\n",
       "362    False\n",
       "363    False\n",
       "364    False\n",
       "Name: ymd, Length: 365, dtype: bool"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "condition"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ymd</th>\n",
       "      <th>bWendu</th>\n",
       "      <th>yWendu</th>\n",
       "      <th>tianqi</th>\n",
       "      <th>fengxiang</th>\n",
       "      <th>fengli</th>\n",
       "      <th>aqi</th>\n",
       "      <th>aqiInfo</th>\n",
       "      <th>aqiLevel</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>59</td>\n",
       "      <td>2018-03-01</td>\n",
       "      <td>8℃</td>\n",
       "      <td>-3℃</td>\n",
       "      <td>多云</td>\n",
       "      <td>西南风</td>\n",
       "      <td>1-2级</td>\n",
       "      <td>46</td>\n",
       "      <td>优</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>60</td>\n",
       "      <td>2018-03-02</td>\n",
       "      <td>9℃</td>\n",
       "      <td>-1℃</td>\n",
       "      <td>晴~多云</td>\n",
       "      <td>北风</td>\n",
       "      <td>1-2级</td>\n",
       "      <td>95</td>\n",
       "      <td>良</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>61</td>\n",
       "      <td>2018-03-03</td>\n",
       "      <td>13℃</td>\n",
       "      <td>3℃</td>\n",
       "      <td>多云~阴</td>\n",
       "      <td>北风</td>\n",
       "      <td>1-2级</td>\n",
       "      <td>214</td>\n",
       "      <td>重度污染</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>62</td>\n",
       "      <td>2018-03-04</td>\n",
       "      <td>7℃</td>\n",
       "      <td>-2℃</td>\n",
       "      <td>阴~多云</td>\n",
       "      <td>东南风</td>\n",
       "      <td>1-2级</td>\n",
       "      <td>144</td>\n",
       "      <td>轻度污染</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>63</td>\n",
       "      <td>2018-03-05</td>\n",
       "      <td>8℃</td>\n",
       "      <td>-3℃</td>\n",
       "      <td>晴</td>\n",
       "      <td>南风</td>\n",
       "      <td>1-2级</td>\n",
       "      <td>94</td>\n",
       "      <td>良</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           ymd bWendu yWendu tianqi fengxiang fengli  aqi aqiInfo  aqiLevel\n",
       "59  2018-03-01     8℃    -3℃     多云       西南风   1-2级   46       优         1\n",
       "60  2018-03-02     9℃    -1℃   晴~多云        北风   1-2级   95       良         2\n",
       "61  2018-03-03    13℃     3℃   多云~阴        北风   1-2级  214    重度污染         5\n",
       "62  2018-03-04     7℃    -2℃   阴~多云       东南风   1-2级  144    轻度污染         3\n",
       "63  2018-03-05     8℃    -3℃      晴        南风   1-2级   94       良         2"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[condition].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3、需要多次str处理的链式操作"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "怎样提取201803这样的数字月份？  \n",
    "1、先将日期2018-03-31替换成20180331的形式  \n",
    "2、提取月份字符串201803  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      20180101\n",
       "1      20180102\n",
       "2      20180103\n",
       "3      20180104\n",
       "4      20180105\n",
       "         ...   \n",
       "360    20181227\n",
       "361    20181228\n",
       "362    20181229\n",
       "363    20181230\n",
       "364    20181231\n",
       "Name: ymd, Length: 365, dtype: object"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"ymd\"].str.replace(\"-\", \"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "ename": "AttributeError",
     "evalue": "'Series' object has no attribute 'slice'",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-13-ae278fb12255>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[1;31m# 每次调用函数，都返回一个新Series\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mdf\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m\"ymd\"\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mstr\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mreplace\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"-\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m\"\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mslice\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m6\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;32md:\\appdata\\python37\\lib\\site-packages\\pandas\\core\\generic.py\u001b[0m in \u001b[0;36m__getattr__\u001b[1;34m(self, name)\u001b[0m\n\u001b[0;32m   5177\u001b[0m             \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_info_axis\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_can_hold_identifiers_and_holds_name\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   5178\u001b[0m                 \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mname\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 5179\u001b[1;33m             \u001b[1;32mreturn\u001b[0m \u001b[0mobject\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m__getattribute__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m   5180\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m   5181\u001b[0m     \u001b[1;32mdef\u001b[0m \u001b[0m__setattr__\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mname\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mvalue\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mAttributeError\u001b[0m: 'Series' object has no attribute 'slice'"
     ]
    }
   ],
   "source": [
    "# 每次调用函数，都返回一个新Series\n",
    "df[\"ymd\"].str.replace(\"-\", \"\").slice(0, 6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      201801\n",
       "1      201801\n",
       "2      201801\n",
       "3      201801\n",
       "4      201801\n",
       "        ...  \n",
       "360    201812\n",
       "361    201812\n",
       "362    201812\n",
       "363    201812\n",
       "364    201812\n",
       "Name: ymd, Length: 365, dtype: object"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"ymd\"].str.replace(\"-\", \"\").str.slice(0, 6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      201801\n",
       "1      201801\n",
       "2      201801\n",
       "3      201801\n",
       "4      201801\n",
       "        ...  \n",
       "360    201812\n",
       "361    201812\n",
       "362    201812\n",
       "363    201812\n",
       "364    201812\n",
       "Name: ymd, Length: 365, dtype: object"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# slice就是切片语法，可以直接用\n",
    "df[\"ymd\"].str.replace(\"-\", \"\").str[0:6]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. 使用正则表达式的处理\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 添加新列\n",
    "def get_nianyueri(x):\n",
    "    year,month,day = x[\"ymd\"].split(\"-\")\n",
    "    return f\"{year}年{month}月{day}日\"\n",
    "df[\"中文日期\"] = df.apply(get_nianyueri, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      2018年01月01日\n",
       "1      2018年01月02日\n",
       "2      2018年01月03日\n",
       "3      2018年01月04日\n",
       "4      2018年01月05日\n",
       "          ...     \n",
       "360    2018年12月27日\n",
       "361    2018年12月28日\n",
       "362    2018年12月29日\n",
       "363    2018年12月30日\n",
       "364    2018年12月31日\n",
       "Name: 中文日期, Length: 365, dtype: object"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"中文日期\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "问题：怎样将“2018年12月31日”中的年、月、日三个中文字符去除？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      20180101\n",
       "1      20180102\n",
       "2      20180103\n",
       "3      20180104\n",
       "4      20180105\n",
       "         ...   \n",
       "360    20181227\n",
       "361    20181228\n",
       "362    20181229\n",
       "363    20181230\n",
       "364    20181231\n",
       "Name: 中文日期, Length: 365, dtype: object"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 方法1：链式replace\n",
    "df[\"中文日期\"].str.replace(\"年\", \"\").str.replace(\"月\",\"\").str.replace(\"日\", \"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***Series.str默认就开启了正则表达式模式***"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      20180101\n",
       "1      20180102\n",
       "2      20180103\n",
       "3      20180104\n",
       "4      20180105\n",
       "         ...   \n",
       "360    20181227\n",
       "361    20181228\n",
       "362    20181229\n",
       "363    20181230\n",
       "364    20181231\n",
       "Name: 中文日期, Length: 365, dtype: object"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 方法2：正则表达式替换\n",
    "df[\"中文日期\"].str.replace(\"[年月日]\", \"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
